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Abstract 

The  goal  of  selecting  that  one  of  k  £  2  Bernoulli  populations 

which  has  the  largest  single-trial  "success"  probability 

p  =  max{p  , . . .  ,p,  }  is  treated.  Consideration  is  restricted  to  pro- 
Lk  J  1  k 

cedures  which  take  no  more  than  n  observations  from  any  one  of  the  k 
populations.  One  such  procedure  is  the  single-stage  procedure  of  Sobel  and 
Huyett  [1957]  which  takes  exactly  n  observations  from  each  of  the  k 
populations.  We  propose  a  one-at-a-time  adaptive  sampling  rule  (R*)  which 
when  used  in  conjunction  with  a  particular  stopping  rule  ( S* )  and  ter¬ 
minal  decision  rule  (T*)  achieves  the  same  probability  of  a  correct 
selection  as  does  the  single-stage  procedure  uniformly  in  =  (p^,...p^). 
Letting  N  denote  the  random  total  number  of  observations  to  terminate 
sampling  using  the  procedure  (R*,S*,T*)  we  show  that  n  <  N  <  kn-1;  for 
Pj-kj  ->-0  we  have  P{N  =  kn-1  |  p}  -+  1  while  for  p^-j  ->-1  we  have 
P{N  =  n  |  pi  +  1.  For  k  =  2  the  sampling  rule  R*  (the  conjugate  sampling 
rule  R")  which  is  stationary  is  optimal  in  the  sense  that  it  minimizes 
E{N|(p1,p2)}  uniformly  in  (p^Pj)  for  pi  +  P2  >  1  ^pl+  P2  <  ^  anion6 
all  sampling  rules  which  use  (S*,T*)  and  which  take  no  more  than  n  observa¬ 
tions  from  either  population;  R*  has  additional  optimal  properties  for 
k  =  2.  The  procedure  (R *,S*,T*)  is  generalized  for  k  >  2  to  accommodate 
such  goals  as  "Selecting  the  s  (1  <  s  <  k-1)  "best”  Bernoulli  populations 
with  regard  to  order,"  and  is  shown  to  have  desirable  properties  for  these 
goals  as  well.  Some  conjectures  are  made  concerning  the  optimality  of 
(R*,S*,T*)  for  k  >  2.  The  performance  of  (R *,S*,T*)  is  compared  for 
k  >  2  with  that  of  other  sequential  selection  procedures  that  have  been 
proposed  in  the  literature.  An  extensive  bibliography  is  included. 


1.  Introduction 


Let  IK  (1  <  i  <  k)  denote  k  >  2  Bernoulli  populations  with  corre¬ 
sponding  single-trial  "success"  probabilities  p^.  Denote  the  ordered  values 
of  the  p^  by  Pq-j  S  •••  5  P[k]’  values  and  pairing  of 

the  n.  with  the  P[j]  (i  1  i»  j  <  k)  are  assumed  to  be  completely  unknown. 

Statistical  procedures  for  the  problem  of  selecting  the  "best"  population 
i.e.,  the  one  associated  with  P[)<]»  have  received  considerable  attention  in 
recent  years.  In  a  fundamental  paper,  Sobel  and  Huyett  [1957]  proposed  a 
single-stage  procedure  employing  the  indifference- zone  approach  of  Bechhofer 

[1954]  with  the  "distance  measure"  A.  .  =  p.  -  p.;  their  procedure  was  shown 

r ,  3  i  3 

by  Hall  [1959]  to  have  the  optimum  property  of  being  "most  economical"  among 

single-stage  procedures.  Paulson  [1967],  [1969],  using  the  distance  measures 

A.  .  and  p./p.,  proposed  the  first  sequential  procedure  for  this  problem. 

1,3  13  ~ 

His  open  procedure  permitted  the  elimination  of  "non-contending"  populations; 

it  employed  a  fixed  number  of  stages  with  a  random  number  of  observations  per 

stage,  the  total  number  of  observations  (N)  required  for  termination  being 

an  unbounded  random  variable.  Bechhofer,  Kiefer  and  Sobel  [1968],  Section 

12.6.1.4,  using  the  distance  measure  p. (1-p . )/p .(1-p. )  (and  A.  .)  also 

1  3  3  1  1.3 

proposed  an  open  sequential  procedure  employing  a  vector-at-a-time  (VT) 
sampling  rule. 

Spurred  on  by  the  potential  of  application  of  such  methods  in  clinical 
trials  and  related  areas,  there  followed  a  period  of  considerable  research 
activity  focusing  on  sequential  procedures  for  this  problem;  these  studies 
were  spearheaded  initially  by  Milton  Sobel,  George  Weiss,  David  Hoel  and 
their  collaborators:  Sobel  and  Weiss  [1970],  [1971a],  [1971b],  [1972a], 
[1972b],  Kiefer  and  Weiss  [1971],  [1974];  Hoel  [1972];  Hoel  and  Sobel  [1972]; 


Hoel,  Sobel,  Weiss  [1972];  Nebenzahl  and  Sobel  [1972].  During  the  period 


19V 3-1980  a  large  number  of  additional  papers  appeared;  all  employed  the 

measure  of  distance  A.  .  (except  Taheri  and  Young  [1974]  who  used  p./p.). 

1  5  J  1  1 

These  papers  are  listed  among  our  references.  An  excellent  review  of  many 
of  these  proposed  procedures  (and  others),  with  particular  reference  to 
adaptive  sampling  for  clinical  trials,  is  contained  in  Hoel,  Sobel  and  Weiss 
[1975b].  A  recent  text  by  Biiringer,  Martin  and  Schriever  [1980]  gives  an  in 
depth  comprehensive  survey  of  these  procedures  (and  many  additional  ones); 
it  treats  their  derivation,  performance  characteristics,  and  uses,  and 
provides  extensive  tables  for  their  implementation. 

Concurrently,  the  Bernoulli  selection  problem  was  studied  employing  the 
subset  approach  of  Gupta  [1956].  The  early  key  papers  using  this  approach 
are  Gupta,  Huyett  and  Sobel  [1957]  and  Gupta  and  Sobel  [I960];  an  up-to-date 
summary  of  more  recent  results  using  the  subset  approach  is  contained  in 
Gupta  and  Panchapakesan  [1979],  Section  13.2. 

The  problem  of  allocating  observations  among  treatments  when  the  total 
available  number  of  observations  is  fixed  (fixed  patient  horizon),  with  the 
objective  of  assigning  a  higher  proportion  of  the  total  number  of  available 
observations  to  the  population  with  the  larger  success  probability  has  been 
studied  in  the  medical  context  by  Armitage  [1960,  1975],  Anscombe  [1963], 

Colton  [1963],  Cornfield,  Halperin  and  Greenhouse  [1969],  Zelen  [1969],  and 
Canner  [1970],  among  others.  For  comments  concerning  this  formulation  of  the 
problem  see  Sobel  and  Weiss  [1972b]. 

A  somewhat  related  class  of  procedures  directed  toward  solutions  of  the 
so-called  2-armed  (or  multi-armed)  bandit  problem  was  investigated  by  many 
research  workers:  Robbins  [1952],  [1956];  Bradt,  Johnson  and  Karlin  [1956], 
Isbell  [1959],  Feldman  [1962],  Smith  and  Pyke  [1965],  Fabius  and  van  Zwet 
[1970],  Kerry  [1972],  [1978],  and  Rodman  [1978],  among  others.  These  papers  are 
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not  concerned  with  the  Bernoulli  selection  problem,  but  rather  focus  on 
minimizing  or  maximizing  appropriate  objective  functions,  the  principal  tool 
used  being  dynamic  programming. 

2.  The  k-population  Bernoulli  selection  problem 
2.1  Earlier  approaches 

Before  we  describe  our  objectives  and  approach,  it  will  be  helpful  to 
sketch  the  chronological  development  of  certain  statistical  aspects  of  the 
Bernoulli  selection  problem.  It  perhaps  is  of  historical  interest  to  note 
that  the  Sobel-Huyett  [1957]  and  Gupta-Huyett-Sobel  [1957]  papers  made  no 
reference  to  the  potential  applicability  of  their  procedures  to  the  drug 
selection  problem  or  to  clinical  trials.  Such  a  reference  appears  first  in 
Paulson  [1967]  (although  Armitage  [1960,  1975],  Anscombe  [1963]  and  Colton 
[1963]  had  earlier  considered  such  applications).  Sobel  and  Weiss  [1970] 
treated  the  special  case  k  =  2,  and  emphasized  the  desirability  of  mini¬ 
mizing  the  number  of  patients  on  the  poorer  treatment.  With  this  objective 
in  mind  they  studied  the  performance  of  the  play-the-winner  (PW)  sampling 
rule  (introduced  earlier  by  Robbins  [1952],  [1956],  and  proposed  specifically 
for  clinical  trials  by  Zelen  [1969]).  The  two  procedures  studied  by  Sobel- 
Weiss  [1970]  employed  PW  and  VT  sampling  rules,  the  latter  having  been 
proposed  earlier  by  Bechhofer,  Kiefer  and  Sobel  [1968]  (B-K-S);  both  pro¬ 
cedures  suffered  from  the  fact  that  the  expected  total  number  of  observations 
(E{N})  required  to  terminate  experimentation  approached  infinity  both  for 
PW  and  VT  as  either  or  ^[2]^^'  overcome  this  problem  for 

VT  for  k  =  2,  Kiefer  and  Weiss  [1971]  suggested  a  truncated  version  of  the 
B-K-S  VT  -  procedure,  and  permitted  the  possibility  of  a  third  terminal 
decision,  i.e.,  "The  two  populations  are  essentially  the  same."  (Later, 
Kiefer  and  Weiss  [1974]  proposed  an  analogous  truncated  version  of  the 
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Sobel-Weiss  PW  sampling  rule.)  The  procedures  of  Sobel  and  Weiss  [1971b] 
for  k  =  2  and  [1972a]  for  k  >  3  which  employed  PW  sampling  and  a 


stopping  rule  based  on  inverse  sampling  also  were  vulnerable  to  0 

since  then  E{N)  -*• 

The  Sobel-Weiss  [1972a]  procedure  was  the  first  (after  Paulson  [1967], 


[1969])  to  consider  the  case  k  >  3  for  the  distance  measure  A. 

1>j 


Although  most  investigators  studied  only  the  k  =  2  case,  Hoel  and  Sobel 
[1972],  Sobel  and  Weiss  [1972b],  Hoel,  Sobel  and  Weiss  [1975a],  and  Schriever 
[1978/79]  considered  the  k  >  3  case.  All  restricted  consideration  to  the 
goal  of  selecting  the  "best"  population. 

Further  work  on  closed  procedures  for  k  =  2  was  carried  out  by  Hoel 
[1972],  Nebenzahl  and  Sobel  [1972],  Berry  and  Sobel  [1973],  Fushimi  [1973], 
Kiefer  and  Weiss  [1974],  Simon,  Weiss  and  Hoel  [1975],  Schriever  [1979]  and 
Tamhane  [1981],  Bofinger  [1978]  and  Schriever  [1978/79]  appear  to  be  the 
only  authors  to  have  considered  closed  procedures  for  k  >  3.  Most  of 
these  procedures  employed  some  variant  of  PW  sampling  rules  designed  to 


minimize  E{N}  and/or  E{N^1^},  the  expected  number  of  observations  taken 


from  the  population  associated  with  p 


[1]’ 


2.2  Our  approach 

In  this  paper  we  have  limited  consideration  to  closed  procedures,  i.e., 
procedures  for  which  the  total  number  of  observations  taken  from  any  of  the 
k  >  2  populations  is  a  bounded  random  variable.  We  are  disenchanted  with 
open  procedures  because  we  believe  that  they  are  of  little  practical  use. 
(This,  of  course,  is  a  criticism  of  all  of  the  ranking  and  identification 
procedures  described  in  Bechhofer,  Kiefer  and  Sobel  [1968],  and  in  a 
hypothesis  testing  or  acceptance  sampling  context  of  the  Wald  sequential 
probability  ratio  test.)  Even  if  E{N)  is  "small"  relative  to  the  kn 


required  by  the  best  competing  single  stage  procedure,  the  distribution  of 
N  is  usually  highly  skewed  to  the  right,  and  hence  "large"  values  of  N 
occur  with  positive  (albeit  small)  probabilities.  This  fact  discourages 
the  use  of  such  procedures . 

Our  reference  point  is  the  single-stage  procedure  of  Sobel  and  Huyett 
[1957]  which  takes  exactly  n  observations  from  each  of  the  k  >  2 
populations.  We  were  able  to  characterize  a  class  of  closed  sequential 
procedures  which  achieve  the  same  probability  of  a  correct  selection  as  does 
the  Sobel-Huyett  (S-H)  procedure,  uniformly  in  p.  Within  this  class  we 
have  found  adaptive  procedures  which  are  uniformly  in  p  superior  in  terms 
of  E{N}  to  the  S-H  procedure.  For  k  =  2  our  procedure  is  optimal 
within  a  certain  class. 

Our  closed  sequential  procedures  for  k  >  2  are  applicable  to  a  broad 

class  of  general  ranking  and  selection  goals  such  as  the  one  described  in 

equation  (6)  of  Bechhofer  [1954],  namely,  "To  select  the  k^  "best" 

populations,  the  k^  ^  "second  best"  populations,  etc.,  and  finally  the  k^ 

"worst"  populations."  Here  k  ,k  , . . .  ,k  (t  <  k)  are  positive  integers 
t  1  1  t  = 

such  that  I  k.  =  k.  To  illustrate  our  procedure  we  consider  in  Section  3 
i  =  l  1 

the  case  t=2,k^=k-s,k2=s  (l<s<k-l)  which  we  call  Goal  I, 

and  in  Section  4  the  case  t  =  s  +  .1,  k,  =  k  -  s,  k„  =  k  =  . . .  =  k  ,  =  1 

1  ’23  s+1 

(l  ‘  s  <  k  -  1)  which  we  call  Goal  II.  Other  goals  not  given  by  (6)  in 
Rcchhoier  [1954]  can  be  handled  similarly. 

The  main  difference  between  our  present  formulation  of  the  problem  and 
that  adopted  in  all  of  the  previous  papers  in  this  category  is  that  the 
so-called  "least-favorable  configuration"  of  the  p-values  (which  plays  a 
central  role  when  designing  an  experiment  using  the  indifference-zone  approach) 
is  of  no  concern  to  us.  Our  interest  is  focused  on  the  probability  of 


achieving  a  correct  selection  for  a  given  n  for  the  particular  goal  con¬ 
sidered,  and  in  accomplishing  this  objective  with  minimum  cost  (e.g.,  minimum 
E{N)  needed  to  achieve  a  correct  selection).  A  special  virtue  of  all  of  our 
procedures  for  k  >  2  is  that  no  special  tables  of  constants  are  necessary  to 
carry  out  the  procedures,  and  the  procedures  are  very  easy  to  implement. 

We  assume  throughout  that  the  response  (success  or  failure)  of  an 
experiment  is  known  sufficiently  soon  that  it  can  influence  the  choice  of 
population  for  the  next  experiment.  This  condition  is  not  met  in  most 
clinical  trials  (although  it  often  can  be  realized  in  testing  in  the  physical 
sciences).  Even  if  this  condition  is  not  met  the  procedures  can  sometimes 
be  used  to  advantage.  (See  Remark  5.3.)  Also,  modifications  of  the  pro¬ 
cedure  can  be  made  to  good  effect  if  the  responses  are  delayed. 


3.  Single-stage  procedures 


In  this  section  we  consider  single-stage  procedures  for  the  Goal  I  and 


Goal  II  Bernoulli  selection  problems.  Let  S^(F^)  denote  a  "success" 

("failure")  from  n.  (1  <  i  <  k).  If  n  observations  are  taken  from  n., 
l  -  -  i 

let  y.  ^  denote  the  number  of  successes  yielded  by  IK  (1  <  i  <  k). 


3.1  Single-stage  procedure  for  Goal  I_ 

PROCEDURE  FOR  GOAL  I  (Selecting  the  s  (1  <  s  <  k  -  1)  "best"  of  k 
populations  without  regard  to  order): 

Sampling  rule  (£„„):  Take  n  observations  from  each  of  the 

k  populations.  ^ 


Terminal  decision  rule  (T  _):  Compute  y.  (1  <  i  <  k). 
-  SS  i,n  =  = 

Let  A^,  A^  c  A  =  {l,2,...,k}  denote  two  disjoint  sets  of  order 


s  and  k  -  s,  respectively,  such  that 
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2 


jn 


(3.1h) 


for  all  i^  e  and  for  all  i  e  A9 .  If  there  are  r  sets 
A^^  =  {A  ,  A^ }  (1  <  i  <  r)  satisfying  (3.1b),  then  select  one  of 
them  at  random  and  announce  for  the  selected  set  that  A^,  A^  are 
associated  with  {P[k]’  P[k-X] » * • • ’Ptk-s+i]1  and 
tp[k_s]» • • ‘ ,p[i]}’  respectively. 

3.2  Single-stage  procedure  for  Goal  II 

PROCEDURE  FOR  GOAL  II  (Selecting  the  s  (l<s<k-l)  "best"  of  k 
populations  with  regard  to  order): 

Sampling  rule  (R^):  Take  n  observations  from  each  of  the 

b  b 

(  3  ' 

k  populations.  v 

Terminal  decision  rule  (T„):  Compute  y.  (1  <  i  <  k). 
-  SS  l ,n  =  = 

^et  Aj_j  A2’'‘'’^s+1  c  A  =  {l»2,...,k}  denote  s  +  1  disjoint  sets, 

A  , . .  .  ,A  of  order  one,  A  of  order  k  -  s,  such  that 
Is  s+1  ’ 


r .  >  y. 

xj’n  =  ^+1’" 


(1  <  j  <  s) 


(3.2b) 


for  i.  e  A.  (1  <  j  <  s)  and  for  all  i  ,  e  A  ,  .  If  there  are  r 

]  j  -  -  s+1  s+1 

sets  A^  =  {A  ,A  ,...,A  +  }  (l  S  i  S  r)  satisfying  (3.2b),  then 
select  one  of  them  at  random  and  announce  for  the  selected  set  that 
Al,A2,...,As  and  As+1  are  associated  with  P[k]»  P[k_ir • • • ,P[k_s+1] 
and  (P[k  s]» • • • ’P[i])’  respectively. 


A 
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Example  3.1:  (k=5,s=3,n=3) 


h 

^2 

h 

S1 

S2 

S3 

\ 

F  5 

S1 

S2 

S3 

S4 

F  5 

S1 

S2 

F3 

f4 

F5 

Then  A(1) 

=  {(1), 

(  2} , 

(3), 

{4,5}} 

a(2) 

{{1}, 

(2), 

(4), 

{3,5}} 

a(3) 

=  {{2}, 

u>. 

{3}, 

{4,5}} 

a(4) 

=  ({2}, 

{1}, 

{4}, 

{3,5}}. 

/  •  \ 

Hence,  select  one  of  A^  (1  <  i  <  4)  at  random. 

Remark  3.1:  The  single-stage  procedures  given  in  this  section  For  Goals  I 
and  II  coincide  for  s  =  1.  The  case  s  =  1  was  studied  in  detail  by  Sobel 
and  Huyett  [1957].  In  that  paper  the  common  sample  size  n  was  chosen  to 
guarantee  certain  indifference-zone  probability  requirements  (as  in  Bechhofer 
[1954])  given  by  their  equations  (5)  and  (13). 

4 .  /\  class  of  sequential  procedures 

We  now  propose  a  class  of  sequential  procedures  for  the  Goal  I  and  Goal  II 

Bernoulli  selection  problems.  Let  S™  (F™)  denote  a  success  (failure)  from 

n.  at  stage  m  (1  <  i  <  k,  l<m<kn).  Let  n.  denote  the  total  number 
i  =  =  =  =  i,m 

of  observations  taken  from  II.  through  stage  m,  and  let  z.  denote  the 

l  l  ,m 

total  number  of  successes  yielded  by  IK  through  stage  m 
(ifi-k,  l<m<  kn). 

Theorem  5.1  (in  Section  5.1)  relates  to  a  class  of  sequential  selection 
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procedures  which  employs  a  very  general  class  of  sampling  rules,  and  a 
particular  stopping  and  terminal  decision  rule  specific  to  the  goal  (Goal  I 
or  Goal  II)  under  consideration. 

Throughout  the  remainder  of  this  paper  we  shall  let  R  denote  an 
arbitrary  sampling  rule  which  takes  no  more  than  n  observations  from  any 
of  the  k  populations.  The  basis  for  specifying  n  (e.g.,  to  guarantee 
an  indifference-zone  probability  requirement  as  in  Sobel  and  Huyett  or  because 
of  availability  of  observations  or  because  of  other  economic  considerations) 
is  of  no  concern  to  us  here. 

4.1  A  class  of  sequential  procedures  for  Goal  I_ 

PROCEDURE  FOR  GOAL  I  (Selecting  the  s(l<s<k-l)  "best"  of 

k  populations  without  regard  to  order): 

Sampling  rule  (R):  Arbitrary,  the  only  restriction  being  that  at  most 

n  observations  can  be  taken  from  any  of  the  k  populations.  Thus, 

(4.1a) 

e.g.,  one-at-a-time  sampling,  play-the-winner  sampling,  vector-at-a- 
time  sampling,  or  multistage  sampling  can  be  used. 

Stopping  rule  (S*):  Stop  sampling  at  the  first  stage  m  at  which  there 
exist  two  disjoint  sets  A^,A ^  c  A  =  (l,2,...,k)  with  A^  of  order 
s  and  A ^  of  order  k  -  s,  such  that 


z  . 

i.^m 


> 


+ 


n 


(4.1b) 


for  all  i^  f  A^  and  for  all  i^  e  A^- 
Terminal  decision  rule  (T*):  If  r  sets  A^^  =  {A^A^}  (l  <  i  <  r) 

satisfy  (4.1b),  then  select  one  of  them  at  random  and  announce  for 
the  selected  set  that  A  and  A^  are  associated  with 


(4.1c) 


I 
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4.2  A  class  of  sequential  procedures  for  Goal  II 

PROCEDURE  FOR  GOAL  II  (Selecting  the  s  (1  <  s  <  k  -  1)  "best"  of 
k  populations  with  regard  to  order): 

Sampling  rule  (R):  Arbitrary,  the  only  restriction  being  that  at  most 
n  observations  can  be  taken  from  any  of  the  k  populations. 


(4.2a) 


Stopping  rule  (S*):  Stop  sampling  at  the  first  stage  m  at  which  there 

exist  s  +  1  disjoint  sets  A^A^, ‘ * ‘ ’^s’^s+l  c  A  =  {1*2,..., k} 

with  A,,..., A  of  order  one,  A  ,  of  order  k  -  s,  such  that 

1  s  s+1  ’ 


z  .  >  z .  +  n  -  n.  (1  <  j  <  s) 

1j‘m=  1jtl’m  1j+1’m  =  = 


(4.2b) 


for  i.  eh.  ( 1  <  j  <  s )  and  for  all  i  ,  e  A 

3  3  =  =  s+1  s+1 

Terminal  decision  rule  (T*):  If  r  sets  A^  =  {A  ,A„,...,A.  } 

X  ^  S  '  X 

(1  <  i  <  r)  satisfy  (4.2b)  then  select  one  of  them  at  random  and 


announce  for  the  selected  set  that  A, ,A„,...,A  and  A  ,  are 

12  s  s+1  ^ 

associated  with  pr.  ,,pr.  pr,  ,  ,  ,  i 

*Tk]  *Tk-l]  *Lk-s+l]  and  {P[k_s]’ '  "  ,p[i]} » 

respectively . 


(4.2c) 


Example  4.4:  For  (k  =  5,  s  =  3,  n  =  3) ,  stop  if 
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Then  A(1)  =  {{1},  {3},  {2},  {4,5}}  and  A(2)  =  {{1},  {3},  {4},  {2,5}}. 
Hence,  select  one  of  A^^  (i  =  1,2)  at  random. 


5 .  Comparison  of  some  performance  characteristics  of  the  single-stage  and 
sequential  procedures . 

In  this  section  we  compare  the  probability  of  a  correct  selection 
achieved  by  our  class  of  sequential  procedures  with  that  of  the  corresponding 
single-stage  procedures.  We  do  the  same  for  the  total  number  of  observations 
required  to  terminate  experimentation  for  the  sequential  procedures  and  the 
total  sample  size  required  by  the  corresponding  single-stage  procedures. 


5 . 1  Probability  of  correct  selection 

If  two  or  more  populations  have  a  common  p-value,  assume  that  the 

populations  are  tagged  in  such  a  way  that  the  ordering  of  the  k  populations 

is  unique.  Then  a  correct  selection  (CS)  for  Goal  I  is  achieved  if  the 

selected  sets  A1?A2  are  associated  with  ^P[k]»P[k-i]» '  *  *  ,p[k-s+l]*’ 

{p  .,...,pr  ..},  respectively;  analogously,  a  correct  selection  for  Goal  II 
LK-S J  LI J 

is  achieved  if  the  selected  sets  A^,A2, . . . ,Ag+1  are  associated  with  tP[]<]}’ 

. fP[k-s+l])’  tp[k-s] . W'  respectively. 

We  now  state  our  first  key  theorem  relating  the  P{CS}  achieved  by  our 

sequential  procedures  and  the  P{CS}  achieved  by  the  corresponding  single-stage 
procedures . 


Theorem  5.1: 

Pj {CS  |  =  PjtCS  |  (R,S*,T*)}  uniformly  in  P  =  (p1»...,pk) 

for  Goal  I,  and  analogously  for  Goal  II. 


Proof:  See  Appendix  A. 
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Remark  5.1:  Note  that  if  the  weak  inequality  in  (4.1b)  and  (4.2b)  were 
replaced  by  a  strict  inequality,  the  associated  stopping  rules  would  involve 
curtailment  of  the  sampling  process.  Then  the  conclusion  of  Theorem  5.1 
would  be  obvious  since  the  resulting  sequential  procedure  and  the  single-stage 
procedure  always  lead  to  the  same  terminal  decision.  However,  such  is  not 
the  case  when  the  weak  inequality  is  used.  For  example,  for  k  >  2,  s  =  1, 
n  >  1,  we  see  that  (4.1b)  calls  for  stopping  if  (say)  the  sequence 
s-  (15  j  <  n)  were  obtained  for  any  i(l<i<k)  in  which  situation 
(4.1c)  would  select  II..  However,  for  that  same  initial  sequence  curtailed 
sampling  would  require  that  at  least  one  more  observation  be  taken  from  all 
populations  IT.  (j^i,  l<j<k);if  these  additional  observations  were  such 
that  a  total  of  r  -  1  additional  populations  also  yielded  n  S's,  then 
the  curtailment  terminal  decision  rule  would  select  one  of  these  r  n-success 
populations  at  random  (which  is  what  the  single-stage  procedure  would  do). 

Thus  (4.1b)  and  (4.1c)  permit  earlier  stopping  than  under  curtailment, 
but  sometimes  may  lead  to  a  different  terminal  decision  than  under  curtailment. 

Remark  5.2:  If  sampling  continues  beyond  the  stage  called  for  by  (4.1b) 
for  Goal  I  or  (4.2b)  for  Goal  II  then  the  P{CS}  is  not  increased 
(provided  that  the  total  number  of  observations  taken  from  any  population  is 
at  most  n).  VT  sampling  for  the  Bernoulli  selection  problem  always  requires 
at  least  as  large  a  total  number  of  observations  as  would  be  required  by 
a  one-at-a-time  sampling  rule  (and,  in  fact,  often  a  very  much  larger  total) 
to  achieve  the  same  P{CS}  for  a  given  data  set.  Thus,  unless  VT  sampling 
is  used  for  (say)  "blocking"  purposes  for  the  Bernoulli,  it  should  ordinarily 
be  avoided.  For  an  example  of  the  latter  situation  see  Tamhane  [1980], 


Remark  5,3:  In  some  areas  of  application,  e.g.,  in  certain  types  of  clinical 
trials  and  in  reliability-life  studies,  the  experiments  may  be  started  at 
different  times,  and  the  outcomes  (successes  or  failures)  from  the  k  pop¬ 
ulations  may  be  staggered  or  spaced  over  time.  This  might  be  the  case  in 
experiments  which  are  designed  in  single-stages  as  with  Sobel-Huyett  and  for 
which  n  observations  are  to  be  taken  from  each  of  the  k  populations.  In 
such  situations  for  (say)  Goal  I,  the  stopping  rule  (4.1b)  and  the  terminal 
decision  rule  (4.1c)  can  be  applied  as  each  success  or  failure  is  recorded. 
Then  (4.1b)  permits  the  possibility  of  an  early  terminal  decision  although 
successes  and  failures  will  continue  to  be  recorded  as  they  occur  after  that 
point.  These  later  observations  make  it  possible  to  estimate  the 
p^(l<i<k)  more  precisely.  They  may  lead  to  a  different  terminal  decision, 
but  they  will  not  increase  the  probability  of  a  correct  selection. 

Remark  5.4:  If  the  common  sample  size  n  of  the  Sobel-Huyett  single-stage 
procedure  was  chosen  to  guarantee  the  indifference-zone  probability  require¬ 
ments  given  by  their  equations  (5)  or  (13),  then  U  fortiori  our  class  of 
sequential  procedures  for  s  =  1  guarantees  these  same  probability  require¬ 
ments.  Although  Sobel-Huyett  did  not  consider  Goal  I  or  Goal  II  for  s  >  1, 
an  analogous  result  would  hold  for  those  goals  as  well. 


Remark  5.5:  For  large  n  the  normal  approximation  to  the  binomial  distribution 
can  be  used  (as  in  Sobel-Huyett)  to  obtain  an  excellent  approximation  to  the 
P{CS}  achieved  by  the  single-stage  procedure  for  Goal  I  or  Goal  II  for  spec¬ 
ified  s  and  given  jj>.  This  computed  P{CS)  thus  holds  for  our  general 
class  of  sequential  procedures  for  the  same  specified  s  and  given  p. 

Remark  5.6 :  A  single-stage  procedure  for  selecting  the  multinomial  event 
which  has  the  largest  probability  is  described  in  Bechhofer,  Elmaghraby  and 
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Morse  [1959];  only  the  case  s  =  1  was  considered.  Alam  and  Thompson 
[1972]  proposed  a  single-stage  procedure  for  the  case  s  =  k  -  1.  The 
sequential  procedures  employing  vector-at-a-time  (VT)  sampling  and 
(S",T:':)  for  the  Bernoulli  selection  problem  given  for  Goal  I 
(l<s<k-l)  and  Goal  II  (l<s<k-l)  in  Section  4  of  the  present 
paper  are  directly  applicable  to  the  multinomial  selection  problem  (with 
obvious  interpretations  of  notation). 

A  sequential  procedure  employing  multinomial  VT  sampling  with 
curtailment  was  proposed  for  the  multinomial  selection  problem  (s  =1) 
by  Gibbons,  Olkin  and  Sobel  [1977],  pp.  178-183.  Our  procedure  improves  on 
the  G-O-S  procedure  in  that  it  achieves  the  same  P{CS>  uniformly  in 
p  as  does  theirs  (and  the  single-stage  procedure),  but  our  procedure 

'V 

requires  at  most  as  many  (and  usually  less)  vector-stages  to  terminate 
sampling.  These  results  with  accompanying  computations  are  contained  in 
Bechhofer  and  Kulkarni  [1981]. 


5 . 2  Total  number  of  observations  to  terminate  sampling 

In  Sections  4.1  and  4.2  we  described  a  class  of  sequential  pro¬ 
cedures  for  Goals  I  and  II,  respectively.  For  each  the  sampling  rule  is 
arbitrary,  the  only  restrictions  being  that  the  rule  adopted  take  no  more 
than  n  observations  from  any  of  the  k  populations,  and  that  it  be  used 
in  conjunction  with  (4.1b)  and  (4.1c)  for  Goal  I  or  (4.2b)  and  (4.2c) 
for  Goal  II.  If  we  denote  by  N  the  random  total  number  of  observations 
that  have  been  taken  from  the  k  populations  when  sampling  stops,  then  it 
can  be  shown  that  using  (4.1b)  and  (4.1c)  for  Goal  I  we  have 

minfsn,  (k-s)n)  <  N  <  kn,  (5.1) 
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or  using  (4.2b)  and  (4.2 c)  for  Goal  II  we  have 


sn  <  N  <  kn; 


if  an  arbitrary  one-at-a-time  sampling  rule  is  used,  then 


N  <  kn  -  1 


(5.2) 


(5.3) 


[or  both  Goal  I  and  Goal  II.  Examples  4.1  and  4.2  show  that  the  lower 
bound  in  (5.1)  and  the  upper  bound  in  (5.3)  can  be  achieved  for  appro¬ 
priate  sampling  rules  and  outcomes .  That  the  procedures  are  closed  is  of 
particular  practical  importance. 

For  either  Goal  I  or  Goal  II  with  given  (k,  s,  n),  the  distribution  of 

N  and  hence  E { N |  p}  (and  other  related  performance  characteristics  of  the 

% 

sequential  procedure)  depend  on  p  and  the  specific  sampling  rule  that  is 
used.  In  Sections  6  and  7  we  propose  a  particular  sampling  rule  which  has 
highly  desirable  properties  when  used  in  conjunction  with  (S*,T*). 

6 .  An  optimal  sequential  procedure  for  k  =  2 

In  this  section  and  the  next  we  continue  to  restrict  attention  to 
arbitrary  sampling  rules  R  which  take  no  more  than  n  observations  from 
any  of  the  k  populations,  and  which  are  used  in  conjunction  with  (4.1b) 
and  (4.1c)  for  Goal  I  or  (4.2b)  and  (4.2c)  for  Goal  II.  We  seek 
sampling  rules  within  this  class  which  have  desirable  properties'.  Indeed  we 
have  been  successful  in  constructing  an  optimal  rule  for  k  =  2  for  several 
definitions  of  optimality  which  are  of  considerable  practical  importance. 


Our  results  are  summarized  in  Theorems  6.1,  6.2  and  6.3,  below. 

In  the  sequel  we  let  N, . .  (M?.„  nL.)  denote  the  random  number  of 
n  (i)  (i)’  U) 


observations  (successes,  failures)  that  have  been  taken  from  the  population  with 

parameter  Pq^-j  (1  <  i  <  k)  when  sampling  stops.  Also  let 

k  k  k 

NS  =  7  N?. .  and  NF  =  J  NF  .  Then  N=  J  N,..=NS+NF.  We  are 

i=l  U;  i=l  i=l  U) 

k-1 

particularly  interested  in  E{N}.  However,  also  of  concern  is  E{  £  N, . .}, 

i=l  U; 

the  expected  total  number  of  observations  taken  from  the  "inferior"  populations, 

i.e.,  those  having  the  smaller  p-values;  this  quantity  is  especially  important 

in  clinical  trials  where  ethical  considerations  play  an  important  role,  and 

p.^  is  the  probability  of  a  "success"  using  treatment  i  (1  <  i  <  k). 

(See  Hoel,  Sobel  and  Weiss  [1975b].)  For  the  same  reason  E{N  }  is  impor- 

k-i 

tant.  In  each  case  we  seek  to  make  E{N},  E{  £  N,..}  and  )  as 

i=l  U' 

small  as  possible.  It  is  obvious  that  for  these  objectives  it  is  sufficient 
to  restrict  attention  to  one-at-a-time  sampling  rules. 


6.1  Minimization  of  E{N} 

We  use  the  following  notation  for  k  =  2.  For  a  state 

(z_  ,n,  ;  z.  ,n_  )  which  does  not  satisfy  (4.1b),  let 

X  yiQ  X^IQ  2  j  in  2,m 

D  =  D  (z,  ,n  ; z  ,n  )  denote  the  sampling  decision  at  stage  m 

m  m  l,m  l,m  2,m  2,m 

(m  =  0,  l,...,2n-l).  D  =  i  means  that  at  stage  m  the  next  observation  is 

m 

to  be  taken  from  H.  (i  =  1,2);  D  =  (1,2)  means  that  at  stage  m  the  next 

i  m 

observation  is  to  be  taken  at  random  from  II ^  or  • 


i 
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Sampling  rule  ( R* ) : 


1  if  n,  -  z,  <  n„  -  z^ 

l,m  l,m  2,m  2,m 


or 


D  =  / 
m  \ 


n.  -  z,  =  n„  -  and  z,  >  z^  , 

l,m  l,m  2,m  2,m  l,m  2,tn 


2  if  n,  -  z,  >  n„  -  z. 

l,m  l,m  2,m  2,m 


or 

n 


(6.1) 


l,m  2_,m  2,m 


.  and  z,  <  z.  , 
!,m  l,m  2,m 


(1,2)  if  n,  -  z,  =  n„  -  z.  and  z.  =  z  “ 
l,m  l,m  2,m  2,m  l,m  2,m 


Theorem  6.1:  Among  all  sampling  rules  R  used  in  conjunction  with 

(S*,T*)  for  k  =  2,  R*  minimizes  E{Nj(p^,p2)}  for  p1  +  p2  >  1. 

The  conjugate  sampling  rule  R*  (in  which  n.  -  z.  and  z.  in 

R*  are  replaced  by  z.  and  n.  -  z.  ,  respectively,  for  i  =  1,2) 

Xyin  i,m  i  j  m 

minimizes  E{N |  (p1>p2)}  for  p^  +  p2  <  1.  Both  R*  and  R*  minimize 
E (N |  (p^jp^)}  for  p^  +  p2  =  1,  and  (for  symmetry)  one  can  choose  between 
them  with  probability  (1/2,  1/2). 

Proof:  The  proof  of  Theorem  6.1  is  quite  long,  and  therefore  is  not  given 
here.  It  is  given  in  detail  in  Kulkarni  [1981]  along  with  the  proofs  of 
Theorems  6.2  and  6.3  which  are  stated  below.  These  proofs  will  be 
published  elsewhere. 


Remark  6.1:  Note  that  R*  and  R*  are  stationary,  i.e.,  the  rules  are 
independent  of  n.  (Contrast  this  result  with  the  one  described  in 
Remark  6.5.) 

Example  6.1:  To  illustrate  the  sequential  procedure  which  employs 


(R *,S*,T*)  for  k  =  2,  s  =  1  and  n 
sequence : 

Cycle  1 

Cycle  2 

Cycle  3  is 
truncated  by 


=  7  we  give  the  following  stopping 


n2  is  selected  after  S2  .  Note  that  we  regard  the  sampling  as  proceeding 
in  cycles ;  within  each  cycle  (except  perhaps  the  last  one)  the  outcomes 
from  each  population  are  a  sequence  of  successes  followed  by  a  single  failure. 
Here  the  last  cycle  is  truncated  by  S  . 

Remark  6.2:  Note  from  S2  of  Example  6.1  that  R*  is  not  a  PW 
sampling  rule.  (See  Robbins  [1956],  Zelen  [1969],  Sobel  and  Weiss  [1970].) 

R*  is  PW  within  a  cycle,  but  may  not  be  PW  as  sampling  progresses  from 
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one  cycle  to  the  next.  Play-the-loser  for  R*  corresponds  to  PW  for  R*. 
Most  of  the  sequential  procedures  proposed  in  the  literature  for  the 
Bernoulli  selection  problem  employ  a  PW  sampling  rule. 


Remark  6.3:  If  the  experimenter  knows  that  p  +  >  1  (Pf  +  P2  <  1) 

then  he  presumably  would  use  R*  ( R* ) .  However,  lacking  such  knowledge  he 
may  be  prepared  to  assume  that  (Pp*P2^  represent  the  outcome  of  a  random 
sample  of  size  two  taken  from  a  Beta  distribution 

B(a,b):  [r(a+b)/F(a)  Kb)]  xa  ^  (1  -  x)^  \  0  <  x  <  1,  a,b  >  0.  Since 
P{p  >  1/2  |  (a,b))  >  1/2  for  a  >  b,  he  may  wish  to  replace  the  assumption 
>  1  (p^  +  1)  by  the  assumption  a  >  b  (a  <  b),  and  chose 

(a,b)  accordingly  to  model  his  assessment  of  the  particular  situation  under 
study.  Then  he  can  use  the  following  empirical  sampling  rule  R*  in  con- 

ili 

junction  with  (S*,T*). 

Empirical  sampling  rule  (R„): 

E 

At  stage  m  estimate  p.  (i  =  1,2)  by  p.  =  (z.  +  a)/(n.  +  a  +  b). 

i  i,m  x ,m  i,m 


+ 

Po 

>  1 

use 

R-, 

l,m 

2  ,m 

+ 

Po 

<  1 

use 

R*, 

l,m 

2  ,m 

+ 

Po 

=  1 

use 

either 

l,m 

2,m 

(6.2) 


Remark  6.4:  Based  on  limited  calculation  for  selected  (a,b)  and  n  <  10 
it  appears  that  the  E{N)  -  values  obtained  for  R*  and  the  optimal  Bayes 
sampling  rule  are  very  close.  Here  the  expectation  is  taken  w.r.t.  the  prior 


Beta  density. 
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Note:  Since  p.  ->  p.  (i  =  1,2)  for  m  -*•  an  error  in  the  choice  of 

-  i  ,m  i 

the  particular  (a,b)  will  have  little  effect  on  the  performance  of 
(R“ ,S* ,T*)  when  n  is  large. 

6.2  Minimization  of 

We  had  mentioned  that  in  clinical  trials  it  would  be  desirable  to 
minimize  the  expected  total  number  of  observations  taken  from  the  populations 
with  small  p- values.  Our  next  theorem  addresses  that  issue  for  k  =  2. 

Theorem  6.2:  Among  all  sampling  rules  R  used  in  conjunction  with  (S*,T*) 
for  k  =  2,  R*  minimizes  E{N^|  (p^p^)}  for  >  1/2. 

Remark  6.5:  If  P^]  <  E/2  it  is  not  possible  to  find  a  stationary  sampling 

rule  which  when  used  in  conjunction  with  (S*,T*)  for  k  =  2  will  minimize 

E { N ( i )  |  (p^  p^ ) }  for  all  (p^  p2),  as  is  illustrated  by  the  following  example 

Example  6.2:  Let  k  =  2  and  suppose  that  Pq]  =  0.085,  p^-j  =  0.250. 

Suppose  that  z11=1’nil=1>Z2i"0»n2i"0,  Using  dynamic  pro¬ 
gramming  (DP)  it  can  be  shown  that  at  stage  1  the  sampling  rule  that 
minimizes  E{N^}  for  This  particular  pair  Pj-0j)  p[2]  < 

and  outcome  (1,1;  0,0)  is  "Select  the  next  observation  from  if 

n  =  2;  select  the  next  observation  from  II  if  n  =  3."  Thus  the  optimal 
sampling  rule  depends  on  n,  and  hence  is  not  stationary. 

F 

6.3  Minimization  of  E{N  }. 

In  clinical  trials  it  is  undesirable  to  obtain  failures  with  any  of  the 
treatments  employed  in  the  trial.  Our  next  theorems  relate  to  that  problem 


for  k  =  2. 
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Theorem  6 . 3A :  Among  all  sampling  rules  R  used  in  ionjunction  with  (S*,T*) 

p 

for  k  =  2,  R*  minimizes  E{N  |  ( » P2 ) )  f°r  > 

Theorem  6 . 3B :  Among  all  sampling  rules  R  used  in  conjunction  with  (S*,T*) 

_  p 

for  k  =  2,  R*  minimizes  E{N  [  ( p^ » P2 ) ^  f°r  P[2]  <  1/^2' 

Remark  6.7:  If  +  p^  <  1  and  P^]  >  1/2  there  exist  points  ^Pj^’P2^ 
such  that  neither  R  nor  R*  when  used  in  conjunction  with  (S*,T*)  for 

p 

k  =  2  will  minimize  E{N  as  as  illustrated  by  the  following 

example . 


Example  6.3:  Let  k  =  2  and  suppose  that  n  =  2  and  Pqj  =  °*10»  P[2]  = 
Suppose  that  z  =  1,  n  =  1,  z  =  0,  n  =0;  using  DP  it  can  be 

1)1  1)1  4  j  1  ^  )  1 

P 

shown  that  at  stage  1  the  R  that  minimizes  E{N  }  for  this  particular  pair 

(pr  ,p  )  and  outcome  is  R*.  Suppose  now  that  z  =  0,  n  =  1, 

z  =  0,  n  =  0;  using  DP  it  can  be  shown  that  at  stage  1  the  R 
*  »  1  ^  *  1- 

f 

that  minimizes  E{N  }  for  the  same  particular  pair  ^P[i]’P[2]^’  but  different 
outcome  is  R*. 

Theorems  5.1,  6.1,  6.2  and  6.3  summarize  four  highly  desirable 
properties  of  R*  when  used  in  conjunction  with  (S*,T*)  for  the  two- 
] opulat ion  Bernoulli  selection  problem.  In  Section  7  we  consider  sampling 
rules  for  the  k-population  (k  >  3)  problem. 


7.  proposed  sampling  rule  for  Goals  I_  and  II  for  k  >  2 

In  this  section  we  propose  a  natural  generalization  to  k  >  2  of  the 
sampling  rule  R*.  This  generalized  R*  (which  we  still  will  refer  to  as 


R*  since  it  reduces  to  R*  when  k  =  2)  when  used  in  conjunction  with 
(S*,T*)  is  thus  a  member  of  the  class  of  sequential  procedures  described  in 


Section  4;  hence  Theorem  5.1  appli<-o.  We  describe  some  of  its  desirable 
properties  in  Section  7.1,  and  conjecture  an  optimal  property  in  Section  7.2. 

Generalized  sampling  rule  ( R* ) : 

At  stage  m  (0  <  m  <  kn-l),  if  sampling  has  not  stopped,  take  the  next 

observation  from  the  population  which  has  the  smallest  number  of  failures 

among  all  FT.  for  which  n.  <n(l<i<k).  If  there  is  a  tie  among  such 
l  l  ,m  =  = 

equal-number-of- failure  populations,  take  the  next  observation  from  that  one 
of  them  which  has  the  largest  number  of  successes.  If  there  is  a  further  tie 
among  ouch  equal-number-of-success  populations ,  select  one  of  them  at  random 
and  take  the  next  observation  from  it. 

Remark  7.1:  We  can  think  of  the  sampling  rule  R*  as  proceeding  in  cycles . 
Before  the  start  of  sampling  the  populations  are  arranged  in  random  order, 
say,  nr,n:,. . . ,JT.  The  first  cycle  is  started  by  taking  one  observation  at 

J.  X 

a  time  from  n'  until  a  failure  is  obtained.  Then  observations  are  taken 
one-at-a-time  from  H'  until  a  failure  is  obtained.  This  process  is  continued 
until  every  population  has  produced  a  sequence  of  successes  followed  by  a 
single  failure.  Then  the  first  cycle  is  complete,  and  every  population  has 
produced  exactly  one  failure  (unless  truncation  has  occurred  during  the  cycle). 
Cycle  i  is  started  by  taking  observations  from  the  population  which  has  the 
largest  cumulative  number  of  successes  through  cycle  i-l(l<i<c) 
whore  c  is  the  random  total  number  of  cycles  until  the  termination  of 
sampling.  That  population  is  sampled  until  a  failure  is  obtained.  The  cycle 
is  continued  by  sampling  from  the  population  which  has  the  second  largest 
cumulative  number  of  successes  through  cycle  i  -  1,  and  sampling  from  that 
population  is  continued  until  a  failure  is  obtained.  This  process  is  continued 
until  in  cycle  i  every  population  has  produced  a  sequence  of  successes 
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followed  by  a  single  failure.  Then  the  ith  cycle  is  complete,  and  every 
population  has  produced  a  cumulative  number  of  exactly  i  failures 
(unless  truncation  has  occurred  during  the  ith  cycle).  If  within  a  cycle 
two  or  more  populations  which  have  not  yet  been  sampled  in  that  cycle  have 
the  same  cumulative  number  of  successes  through  cycle  i  -  1,  then  they  are 
sampled  in  random  order. 

Remark  7.2:  Sampling  rule  R*  had  been  proposed  earlier  for  k  >  2  by  a 
referee  of  Sobel  and  Weiss  [1972a];  see  the  sampling  rule  R^  referred  to 
on  pp.  18C9  and  1824  of  their  paper.  This  sampling  rule  was  to  be  used  in 
conjunction  with  a  stopping  rule  based  on  inverse  sampling.  However,  as 
noted  above,  our  motivation  for  proposing  R*  is  that  it  is  a  natural 
generalization  to  k  >  2  of  the  sampling  rule  of  (6.1). 

Example  7.1:  To  illustrate  the  sequential  procedure  which  employs  (R *,S*,T*) 
for  k=3,  s  =  1,  n  =  8  we  give  the  following  stopping  sequence: 


ni 

n2 

A 

A 

Cycle  1 

A 

A 

A 

A 

A 

Cycle  2 

A0 

A 

AT 

s11 

si 

F12 

F1 

Cycle  3  is 
truncated  by  S* 


n  is  selected  after  S  . 

Remark  7.3:  The  stopping  sequences  given  in  Examples  4.1  -  4.4  could  have 
been  obtained  using  (R*,S*,T*). 

Example  7.2:  We  illustrate  Theorem  5.1  by  calculating  the  exact  P{CS} 
when  (RSSJSS>  and  (R*,S*,T*)  are  used  for  k  =  3,s  =  l,n  =  l  (as  in 
Examples  4.1). 


Table  7.1 


Outcomes  ^ 
loading  to  CS 

for  (RSS’TSS) 

Probability  of 

outcome 

1/ 

and  then  CS 

Stopping  sequences 
1/ 

leading  to  CS 
for  (R *,S*,T*) 

Probability  of 

stopping  sequence 

1/ 

and  then  CS 

(srW 

P1  P2  p3  | 

4 

—  p 

3  f3 

(F1,S2,S3) 

(1'Pl)p2  P3  | 

S3 

1  , .  ,1 

3  Pl^  2  P3 

<Sl’r2’S3> 

PlU-p2>P3  | 

P2  P3 

1  ,,  ,1  p„ 

3  (l-P2)-2  3 

{F1’F2’S3) 

(l-Pl)(l-P2)P3 

F1  F2 

J  (1-P1)i<l-P2) 

(F1’F2’F3) 

(1-P1)(1-P2)(1-P3)  i 

3 

F1  F2 

2  1  • 

|(l-p2)|d-Pl) 

p(cs)  =  t  -  i<  v2-2  p3> 

+  ^-[2  p^  p2  -  (p  +p 

)P3] 

—For  simplicity  of  notation  in  this  example  we  have  assumed  that 
nax(p  ,p2)  <  p3- 


7.1  Some  properties  of  the  procedure  (R*,S*,T*) 

The  sequential  procedure  employing  (R*,S*,T*)  has  the  following  properties 
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a)  P{CS|(R* ,S*,T*),  p}  =  P{CS|(R  ,T  ),  p}  uniformly  in  p  for 

bb  bb  ^ 

both  Goal  I  and  Goal  II. 

b)  For  Goal  I 

min{sn,  (k-s)n}  <  N  <  kn  -  1, 

and  for  Goal  II 

sn  <  N  <  kn  -  1 

for  all  p.  These  bounds  on  N  can  be  achieved. 

c)  j^(ioo)  <  ^-T-n-EM.(100)  <  (100) 

for  Goal  II  and  analogously  for  Goal  I  for  all  p. 

% 

Here  (kn  -  E{N})100/kn  is  the  percent  saving  in  expected  total 
number  of  observations  if  (R *,S*,T*)  is  used  in  place  of  the  corre¬ 
sponding  single-stage  procedure.  This  saving  is  always  positive  and  can 
be  very  large  for  Pj-jj  1* 

d)  P{N  =  sn|p}  1  for  Pqj  1  for  Goals  I  and  II, 

P{N  =  kn  -  s|p}  1  for  pr  -*•  0  for  Goal  I, 

P{N  =  kn-l|p)  -*■  1  for  pr  -*■  0  for  Goal  II. 

e)  Populations  with  small  p- values  tend  to  be  sampled  less 
frequently. 

f)  No  special  tables  of  constants  are  necessary  to  carry  out 
(R* ,S* ,T* )  for  k  >  2,  and  it  is  very  easy  to  implement. 

We  note  from  b)  that  using  R*  instead  of  an  arbitrary  R  with 
(S*,T*)  yields  a  smaller  upper  bound  (kn  -  1  instead  of  kn)  for  N. 
The  fact  that  the  sequential  procedure  employing  (R*,S*,T5,t)  is  closed 
increases  its  potential  for  use  in  real-life  applications. 


pppw 1 
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7.2  Some  conjectures  concerning  the  procedure  (R*,S;V, T*) 

For  k  >  2  we  make  a  conjecture  for  (R *,S*,T*)  that  is  a  general¬ 
ization  of  Theorem  6.1.  The  conjecture  is  made  only  for  the  case  s  =  1; 
when  s  >  1  the  situation  is  much  more  complicated,  and  it  is  very  difficult 
to  conjecture  an  optimal  sampling  rule  even  for  a  limited  region  of  the  parameter 
space .  ( See  Remark  7.7.) 

Conjecture  7.1:  Among  all  sampling  rules  R  used  in  conjunction  with 
( S* , T" )  for  k  >  2,  s  =  1,  R*  minimizes  E { N ] p }  for  +  P[2]  > 

Note:  If  this  conjecture  is  true  then  for  s  =  k  -  1  generalized  R* 

(defined  in  the  obvious  way)  minimizes  E{N|p}  for  pr  +  pr  ,  <  1. 

Remark  7.4:  Conjecture  7.1  was  checked  numerically  for  k  =  3,  s  =  1, 

n  =  1(1)6  over  a  fairly  fine  grid  in  the  region  +  P[2]  >  and 

was  found  to  be  true. 

Remark  7.5:  For  k  =  3,  s  =  1,  n  =  1(1)6  the  condition  P^]  +  P[2]  >  ^ 
of  Conjecture  7.1  is  not  necessary.  For  example,  it  was  found  by  solving 
the  dynamic  programming  equations  on  the  computer  that  R*  minimizes 
EfN |  [>}  for  (P[1]’P[2],P[3]^  =  (0-35>  °-5>  0.7). 

Remark  7.6:  For  k  =  3,  s  =  1  it  is  not  possible  to  find  a  stationary 

sampling  rule  that  minimizes  E { N | p }  for  all  p  such  that  pr  .  +  pro-,  <  1 

*\j  LUL^J 

as  is  illustrated  by  the  following  example: 

Example  7.1:  (k  =  3,  s  =  1,  n  >  3)  Let  ^p[i]’p[2]’p[3]^  =  0,5» 


Consider  the  outcomes 


n 

S 

S 

F 


3 

1 

3 

2 

3 

3 

3 


For  n  =  3  the  optimal  sampling  rule  takes  the  next  observation  from  11^ 
while  R*  takes  the  next  observation  from  II^.  However,  for  n  =  4  the 
optimal  sampling  rule  takes  the  next  observation  from  n^.  Thus  the  optimal 
sampling  rule  is  not  stationary. 


Remark  7.7:  For  (k  =  3,  s  =  2,  n  =  3),  Goal  I,  the  following  example  shows 
that  R*  used  with  ( S* , T* )  does  not  minimize  E{N | p}  for  any  p  such 
that  0  <  ptl]  <  p[3]  <  1: 

Example  7.2: 


n 

S 

S 


1 

3 
1 

4 
1 


R*  takes  the  next  observation  from  n  .  However,  the  optimal  sampling  rule 
takes  no  more  observations  from  T!^,  but  chooses  at  random  between  and 

JI  for  the  next  observation.  (A  similar  example  can  be  constructed  for 


In  Theorems  6.1  -  6.3  and  Conjecture  7.1  we  have  limited  con¬ 


sideration  to  a  class  of  sampling  rules  which  take  at  most  n  obser¬ 
vations  from  each  of  the  k  populations,  and  which  are  used  in  conjunction  with 
(S",T").  We  believe  that  the  conclusions  given  in  these  theorems  and 
conjecture  hold  for  a  broader  class  of  stopping  and  terminal  decision  rules. 

Our  belief  is  summarized  in  the  following  conjecture. 

Conjecture  7.2:  Among  all  sampling  rules  R  used  in  conjunction  with  a 
stopping  rule  (S)  and  terminal  decision  rule  (T)  which  achieve  the  same 

P{CS}  as  (R„„,T uniformly  in  p,  (R *,S*,T*)  has  the  optimal  properties 

bb  bb  ^ 

described  in  Theorems  6.1  -  6.3  and  Conjecture  7.1. 

8 .  Performance  of  (R*,.S*,T*)  relative  to  that  of  other  Bernoulli  selection 


For  the  purpose  of  comparing  the  performance  characteristics  of 

( R* ,S*,T* )  with  those  of  some  competing  Bernoulli  selection  procedures, 

we  have  categorized  the  latter  into  two  groups:  those  that  achieve  the  same 

P{CS}  as  (R *,S*,T*)  uniformly  in  p,  and  those  that  do  not. 

s, 

8.1  Procedures  having  the  same  P{CS}  as  (R 

a)  Hoel  [1972]  proposed  a  closed  sequential  procedure  for  k  =  2 

employing  a  PW  sampling  rule  (R  ).  It  can  be  shown  that  Hoel's  pro- 

rW 

cedure  employs  the  stopping  rule  (S*)  and  terminal  decision  rule  ( T* ) . 
His  stopping  rule  depends  on  a  constant  r  which  is  actually  the  same  for 
specified  {A*,P*}  as  the  n-value  necessary  to  implement  the  Sobel-Huyett 
single-stage  procedure  (Rgg,Tss)  for  t^ie  same  specification.  Thus 
(RpW, S*,T*)  achieves  the  same  P{CS)  uniformly  in  p  as  do  (R*,S*,T*> 


m 
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and  P*  =  0.90,  0.95,  0.99.  Using  these  (r  =  n)  -  values  for  (R*,S*,T*) 


and 

^Rss,Tss) 

and 

P*  =0.90 

and 

(Rpw,5"’r 

with 

PQ]  =  P 

<  (> 

)  E{N | p ; ( R 

*\i 

e{n| 

p;(Rpw’S*’ 

for 

P1  +  P2  > 

*1  2  i  2  »  9  *w**w.,  ,  v-w  ~  5 

Theorem  6.1.  Moreover,  it  follows  from  Theorem  6.2  that 
E{N(1)|p;(R*,S*,T*)}  <  E{N(1)|p;(Rpw,S*,T*)}  if  P[2]  >  1/2;  in  fact, 
it  appears  from  our  computations  that  the  inequality  holds  for  all  p^-j  >  Pqj’ 
Sec  also  Pradhan  and  Sathe  [1974]. 


b)  Nebenzahl  and  Sobel  [1972]  proposed  two  sequential  procedures,  the 
first  employing  a  vector-at-a-time  sampling  rule  (RVT)  and  the  second 
employing  a  PW  sampling  rule  (Rpw),  both  stopping  when  a  fixed  total 
number  of  observations  n^  had  been  taken.  They  (N-S)  chose  nT  to 
guarantee  the  {A*,P*}  indifference-zone  probability  requirement.  N-S 
showed  that  for  the  same  n^  both  procedures  achieved  the  same  P{CS} 
uniformly  in  p,  and  that  the  total  number  of  observations  taken  from  the 

'V, 

population  with  pr  ,  was  never  more  for  R  than  for  R  .  If  nT 

is  even  (odd)  it  can  be  shown  that  the  P{CS}  using  (R*,S*,T*)  with 

n  =  np/2  ((n^+D/2)  is  equal  to  that  achieved  by  the  N-S  procedures  with 

n  ,  uniformly  in  p.  Also,  if  n  is  even  (odd)  then  E{N}  using 
1  %  I 

(R*,S*,T*)  with  n  =  n^/2  (n  =  (n^+D/2)  is  smaller  (smaller  except  when 
pp  =  p2  =  0)  than  that  obtained  using  the  N-S  procedures  with  n^,, 
uniformly  in  p.  See  BUringer,  Martin  and  Schriever  [1980],  p.  262,  for 
further  comments.  Also  see  Pradhan  and  Sathe  [1973],  [1976]. 


c)  Bofinger  [1978]  proposed  a  variant  of  a  PW  sampling  rule  with 


curtailment  which  is  applicable  to  the  Bernoulli  selection  problem  for 
k  >  2.  She  considered  our  Goal  I,  her  "best"  populations  being  the  ones 
associated  with  ^P[i]’P[2]* ' ' ‘ ’^[k-s ]^ "  ^er  corresponds  to  our 

play-the- loser. )  Bofinger' s  procedure  achieves  the  same  P{CS}  uniformly 


in  p  as  does  (R  ,7  )  which  uses  the  same  n-value,  and  hence  also  as 

^  OO  bb 

does  (R,S*,7*).  It  would  appear  that  generalized  (U*,S*,T*)  would  have 
smaller  E{N)  than  that  of  Bofinger' s  procedure,  at  least  for  s  =  k  -  1 
when  Pj-j^  +  <  (See  our  Conjecture  7.1.)  However,  we  have  made  no 

computations  to  verify  this  assertion. 


8.2  Procedures  having  different  P{CS}  from  (R*,S*,7;») 

a)  Sobel  and  Weiss  [1972a]  proposed  open  sequential  procedures  for 

k  >  2  employing  VT  and  several  PW  sampling  rules  all  of  which  employed 

a  stopping  rule  based  on  inverse  sampling.  All  of  the  Sobel-Weiss  (S-W) 

procedures  stopped  sampling  as  soon  as  any  population  produced  r  successes. 

The  constant  r  was  chosen  to  guarantee  the  {A*,P*}  probability  requirement, 

and  was  shown  to  have  the  same  value  for  all  of  their  procedures.  (Berry  and 

Young  [1977]  proved  that  these  procedures  using  the  same  r  achieved  the  same 

P{CS}  uniformly  in  jd. )  However,  if  r  of  the  S-W  procedures  and  n  of 

the  Sobel-Huyett  (S-H)  procedure  were  chosen  to  guarantee  the  same  (A* ,P* } 

requirement,  then  the  P{CS}  achieved  by  these  procedures  is  not  the  same 

uniformly  in  p.  If  (R *,S*,7*)  is  used  with  the  S-H  n,  then  from  Theorem 

a, 

5.1  we  have  that  the  P{CS}  achieved  by  (R *,S*,T*)  and  (R  ,7  )  is  the 

bb  bb 

same  uniformly  in  p,  but  this  P{CS}  differs  from  that  of  the  S-W  pro- 
cedures  which  use  the  corresponding  r.  This  fact  makes  it  difficult  to 
compare  the  E{N}  -  values  for  (R*,S*,7*)  with  those  of  the  S-W  procedures. 


If  Pq-j  “*■  1  then  the  S-W  procedures  which  use  one-at-a-time 
sampling  have  smaller  E{N}  than  does  the  corresponding 
However,  if  0  then  E{N}  ■>  00  for  the  S-W  procedures  while 

N  <  kn  -  1  for  (R* ,S* ,T*) . 

b)  Berry  and  Sobel  [1973]  proposed  a  closed  sequential  procedure  for 
k  =  2  employing  a  PW  sampling  rule.  Sampling  stops  when  either  population 
produces  r  successes  or  both  produce  at  least  c  failures.  The  constants 
(r,c)  were  chosen  to  guarantee  the  {A*,P*}  requirement;  Berry  and  Sobel 
(B-S)  recommend  the  choice  r  =  c  as  optimal.  As  with  the  S-W  procedure, 
the  B-S  procedure  has  a  P{CS}  -  function  which  differs  from  that  of 
(R  ,T  )  if  r  =  c  and  n  are  chosen  to  guarantee  the  same  {A*,P*} 

JJ  Ob 

requirement.  A  fortiori  the  B-S  procedure  has  a  different  P{CS}  -  function 
than  does  (R *,5*,T*)  which  uses  the  n  of  (Rss,Tss), 

The  B-S  procedure  has  very  desirable  E{N)  behavior  relative  to  that 
of  (R *,S*,T*),  but  less  desirable  P{CS)  behavior.  When  we  determined  n 


and  r  corresponding  to  the  nine  (A*,P*)  combinations  for  A*  =  0.1,  0.2, 


0.3  and  P"  -  0.90,  0.95,  0.99  and  set  P[2]  ~  P[i]  =  A*  with  Pf-2] 
varying  we  found  that  E{N | ( B-S ) }  <  E { N | (R*,S*,T*)1  except  in  a  small  neighbor 
hood  of  (P[1]’P[2])  =  (1/2  "  A"/2’  1/2  +  A*/2)  where  P(CS | (R* ,S* ,T* ) } 
achieves  its  minimum  in  the  preference  zone.  (Note:  P{CS|(B-S)}  achieves 
it-,  minimum  in  the  preference  zone  at  ^P[l]’P[2]^  =  -  A5’:/2,  1/3  +  A*/2) 

and  at  (2/3  -  A*/2,  2/3  +  A*/2);  this  result  is  an  asymptotic  (P*  ■*  1) 
one.)  However,  P{CS|(B-S)}  <  P{CS | (R*,S*,T*) }  except  in  approximately  this 
came  small  neighborhood.  Thus  it  appears  that  the  decrease  in  E{N}  for  B-S 
is  purchased  at  the  cost  of  a  decrease  in  P{CS}.  (Small  changes  in  P{CS} 
result  in  large  changes  in  E{N}  when  the  P{CS}  is  close  to  unity.)  We 
also  found  that  max{N|(B-S)}  >  max{N | ( R* ,S* ,T* ) }  for  most  {A*,P*}  of 
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c)  Schriever  [1978/79]  generalized  the  B-S  procedure  to  k  >  2. 

His  procedure  which  employs  a  PW  sampling  rule  stops  sampling  when  any 
population  produces  r  successes  or  every  population  produces  at  least 
c  failures.  The  constants  (r,c)  are  chosen  to  guarantee  the  {A*,P*} 
requirement;  Schriever  too  recommended  the  choice  r  =  c  as  optimal.  At 
this  time  we  do  not  have  sufficient  exact  calculations  to  compare  the 
performance  of  Schriever' s  procedure  with  that  of  (R*,S*,T*)  for  k  >  2. 
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Appendix:  Proof  of  Theorem  5.1. 

We  shall  prove  Theorem  5.1  for  the  case  s  =  1;  the  proof  for  s  >  1  proceeds  in 
a  similar  manner.  In  this  Appendix  we  denote  by  P^»  i*®*»  P^  5  P2  £••• 

p  are  the  ordered  p's.  For  0  <  m  <  kn  define 

K  — 


.Qn  =  {x  =  (y,  ,n,  ; .  . .  ;y,  , 

m  m  ^l.m  l,m  -'k.m 


wliere  v.  and  n.  are  the  number  of  successes  and  number  of  obser- 
■ l  ,m  i ,m 

vntions,  respectively,  from  tk  through  stage  m  (1  <  i  <  k).  (Here 

I i'i : ;  probability  of  success  p^.)  To  emphasize  the  dependence  of  ^ 

and  n.  on  x  we  shall  use  the  notation: 
l  ,m  m 


y.(x  )  =  y.  , 
l  m  Ji,m 


n. (x  )  =  n. 
l  m  i 


m 


(!  ,<  i  <  k). 


!\>r  0  <  m  <  kn  define  the  set  of  stopping  states  at  stage  m  by 


,,n 

m 


fx 

in 


ftn:  3  i  s.t. 
m 


y.(x  )  >  v.(x  )+n-n.(x  )  V  j  t  i } 
a  m  ]  m  j  m 


and  the  set  of  continuation  states  at  stage  m  bv  cn  =  £2°  \  Sn. 

m  m  « 


Let 


_  kn  „  kn 

S  =  u  S  and  Cn  =  u  Cn. 
m=0  m  m=0  m 
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Define  the  probability  of  reaching  state  along  any  feasible  path 

using  samp^Ling  rule  R  by 


P„(x  )  =  e„( x  ) P ( x  ) 
K  in  K  m  in 


when 


k  y.(x  )  n.(x  )-y.(x  ) 

P(x  )  =  n  p.1  m  (1-p.  )  1  m  1  m 
m  ,=1  i 


and  a^(x^)  is  the  randomization  coefficient  employed  by  R. 

If  it  is  not  possible  to  reach  state  xm  using  R,  then  it  is  clear 

that  P„(x  )  =  0,  In  this  situation  define  a„(x  )  =  0.  Note  that 

a0(x  )  depends  only  on  the  rule  R  and  in  particular  is  independent 
K  m 

of  the  p^'s;  a^(x^)  can  and  usually  does  depend  on  the  data. 

Example  A.l:  Suppose  that  R  is  PW  for  k  =  2  with  the  following 
modification:  if  at  any  stage  the  number  of  successes  and  failures  is 
the  same  for  both  populations ,  then  take  the  next  observation  from  one 
of  them  at  random. 


Let  n  =4,  x.  =  (2,3;  1,2).  It  is  possible  to  reach  state  x, 
D  5 

uli'Ti;:  different  sampling  paths;  two  such  paths  are: 

_  _  1  2  3  4  5 
and  s2  -  S1  F^  S2  F2  S.^ 


.ow , 


and 


L  2  1 

l’{rei  -hing  slate  x  along  }  *  -  p.  ,(  1  *P2  )P'^  ( 1-P'^  )  =  ^  (A.l) 


Pfreaching  state  x,.  along  s2>  =  ^  p>^ ( 1-p^  )p2( l-p2>j =  —  P(x^).  (A. 2) 


ii  ’‘tix  ri  rtf— 
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Thus  we  see  that  for  the  particular  paths  of  our  example  the  coefficients 

in  ( A.l)  and  (A. 2),  above,  depend  on  the  sampling  rule  R  and  the  path 

followed.  For  any  particular  R,  a„(x  )  is  the  sum  of  all  such 

K  m  - — 

coefficients.  To  illustrate  this  latter  point  consider  the  following 
example. 


Hxample  A. 2:  Let  R  =  R^,,.  Here 


where 


Jlld 


PR(xkn)  =  aR(xkn)P(xkn} 


k  y.(x,  )  n-y.  (x.  ) 

\  -  V  1  kn  fi  \  1  kn 

P(x,  )  =  Jl  p.  (1-p.) 

kn  .  '  l  r  i 

i  =  l 


aR(xkn} 


\  (y.c"k 


Let  K  =  {l,2,...,k}  and  A  _c  K,  A  i-  0.  Let  denote  the  set 

of  jll  stopping  states,  xkn,  which  would  lead  to  selection  at  random 
!  rt  m  the  populations  in  A,  using  ^sS’^SS^’  i*e*> 


WA  =  {xkn  °knS  =  max  yn(x^>  V  i  e  A 


1  ^  l<j<k  3  ^ 


'.(x.  )  <  max  y.(x,  )  V  i  t  A). 

1  kn  !Sj<k  3 


Nt  :ct  let  V,  denote  the  set  of  all  stopping  states,  x  ,  which  for 
A  m 

seine  m  lead  to  selection  at  random  from  the  populations  in  A, 

ii-  i np  (R,S*,^*)»  i.e.. 


V,  -  (x  -  Sn  (1  <  m  <  kn):  y.(x  )  >  y.(x  )+n-n.(x  )  V  j  t  i  iff  i  e  A) 
A  m  m  -=  -'lm  'jm 


iMT— ini  i  II  rl*iri  iifc1iliaiilBi>i1>'?  i  n || t\\ I nrtli'iiidiaMMMM 
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Let  x  c  Un  (0  <  m  <  kn)  and  x,  c  .  We  will  say  that  x  c  x, 
m  m  =  =  kn  kn  m  kn 

if  it  is  possible  to  augment  x^  by  the  l'emaining  observations  which 

ha\’"  not  vet  been  taken  to  obtain  x,  ,  i.e.,  if 

kn 


0  <  y.(x,  )  -  y.(x  )  <  n-n.(x  )  (1  <  j  <  k). 

—  i  kn  j  m  -  j  m  —  — 

2 

Py-fiplo  A. 3:  I.ct  k  =  2,  n  =  2  and  suppose  that  x?  =  (0,1;  1,1)  e 

2 

and  x^  =  (1,2;  1,2)  .  0^.  Then  x^  c  x^  since 


0  *  yj(x4)  "  yj(x2)  -  2_nj(x2)  =  1’2)< 

Let  x  c  x,  .  Define  the  'difference'  between  the  original  state  and 
in  kn 

the  augmented  state  by 


d(x  ,x,  )  =  {y.(d(x  , x,  )),n  (d(x  ,x  ));...; 

m  kn  1  m  kn  1  m  kn 


. . .  ;y,  (d(x  ,x,  )),n.  (d(x  ,x,  ))} 

k  m  kn  k  m  kn 


where 


y.(d(x  ,x,  ))  =  y.(x,  )  -  y.(x  ) 

j  n  kn  j  kn  '  -j  m 


n . (d(x  ,x,  ))  =  n  -  n . (x  ) 

1  m  kn  ]  u 


(1  <  j  <  k) , 


thus. 


k  VL-  )-y.  (x  )  n-n.(x  )-y.(x,  )+y.(x  ) 

r(d(x  ,x,  ))  =  n  p.1  *n  1  m  U-p.)  J  m  1  ^n  'i  m 


m  kn  .  .  ‘  i 
i  =  l 


uid 


k  y.  (x  )  n-y .  (x,  ) 

P(x  )P(d(x  ,x.  ))  =  n  p.1  Kn  (1-p.)  1  kn 

m  m  kn  v i' 


=  P(\n 


). 


(A. 3) 


oince  is  the  largest  of  the  p*s  and  k  c  A  implies  correct 

selection  with  probability  1/jrtJ,  we  can  write 


NC.s|(R  J  )}  =  l 

A:  Ac K, 


ScA  X,  L  W  VXkn) 


kn  A 


F{CS|  (R,S5!,T«) }  =  J  l  TTT 

A:  AcK,  keA  x  eV.  K  m 


rc-r  x^  c  let  En(xm)  denote  the  set  of  all  possible  augments 

tions  x.  of  x  ,  i.e., 
kn  m 


EU(x  )  =  {x,  f  :  x  c  x  }. 
m  Kn  kn  m  kn 


To  prove  Theorem  5.1  we  shall  use  the  following  lemmas. 


Ltmma  A.l.  Let  x  t  S  .  Then, 
-  m  m 


k  /  n-n. lx  )  \ 

y  n  ,  v  ,  .)  P(d(x  ,x,  ))  =  1. 

‘■n,  .  .  ,  \y.(x,  )-y.(x  )/  m  kn 

x,  cE  (x  )  i=l  vi  kii  m  / 

kn  m 


Proof :  When  we  consider  all  possible  augmentations  of  x  we  have  k 
- -  m 

independent  binomial  distributions  with  parameters  (n-n.(x  ),p.) 

l  m  i 

( ]  <  i  <  1).  Hence,  l.h.s. 


n-n  (x  )  n-n, (x  )  .  ,  .  . N 

1  m  km  k  /n-n.(x  )\  y. 

y  ...  i  u  1  m  P.j(i-p.) 
yL=o  yk-°  i=i  V  yi  )  1  1 


n-n.(x  )-y. 
l  m  i 


n-n.(x  ) 


i  m  /n-n.(x  )\  y.  n-n.(x  )-y. 

>;  (  ..x  m  ]p/(i-p,)  1  m  1 


i 


44 


herein  A. 2.  Lot  x  •  V.  and  suppose  that  x  c  x,  .  Then 
-  m  A  m  kn 

x,  t  u  Wg.  In  other  words,  if  x  is  a  stopping  state  using  R 
'n  BoA  m 

which  leads  to  selection  among  the  populations  in  A,  then  if  x^ 

is  augmented  to  obtain  x  ,  we  wall  at  most  randomize  among  the 

elements  of  a  superset  of  A. 


Hoof:  x  c  V 
- m  A 

->  y.(x  )  y.(x  )  +  (n-n.(x  ))Vj/i,Vi«A. 
i  m  j  m  1  in 

lienee,  x  c  x. 

m  kn 


>  y.(x  )  +  (n-n . (x  ))  V  j  i  i 
=  '  m  i  m 


>  y.(x  )  +  ( y . ( x,  )  -  y.(x  )) 
m  kn  j  m 


=  y .  ( x,  ) , 
1  Hn 


i.c,  y.(xkn)  >  y.(xkn)  ViM,  v  i  c  A 


and  possibly  for  some 
at  least  for  all  i  ■ 


i  /  A.  Thus  it  follows  that  y.(x,  ) 

7 1  kn 

A.  Hence,  x,  <  W_  for  some  B  =>  A. 

kn  B  — 


=  max  y .  ( x 
1*5  <k  3 
Therefore , 


kn 


x  c 
m 


kn 


-•*  x,  c  u  W  . 
kn  „  ,  B 
B^A 


D 


Note : 

If 

X, 

kn 

t  ti*  'll 

X  r 

u 

m 

IKA 

)r  'mu 

A. 3. 

I.c 

W,  and  if  x  <  S 
A  m 


is  n.t.  x  c 
m 


*kn 


for  some  m. 


kn 


fi"  and  suppose  R  is  any  sampling  rule  which 
Kn 


takes  no  more  than  n  observations  per  population.  Then, 


J 
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■-  jJ r  *  2jL.  «.*•  S’  .  .'•* 


k  /  n-n. (x  )  \  k  /  \ 


X  CX  X 

iii  kii  in 


Pro”:' :  (’ 

on cider  an  arbitrary 

<:  1  ‘.rill. 

)  in  whi.'h  (R  c. 

1  o  -.s  Lble 

storpinr  slates  x  < 

m 

(K...  J. 

.Al  OH 

for  the  same  x.  , 
kn 

A  •  X,  . 

in  in 

C  lear.l  y , 

so  consider  the 


r { term i not  ion  usinr.  lR,S*,T*)|x.  usirq;  R  }  -  ] 

Kii  1)3 


Piterminate  at  state  x  ;  complete  x  to  x,  } 

; _ m _ _ _ m _ kn  _  . 

x  cv  (x,  ) 

Km  Xkn  ^SS  kn 


P{ terminate  at  state  x  }P{complete  x  to  x,  x  } 
;  7  _  m  m  kn'  m 

l.  ,  l  - Z - 77. - 5 - - - 


X  CX, 

m  kn 


^R,  (xkn) 


Pp'x  ) 
K  in 


[k  r  n-n .  ( x  )  "\ 

(v.(x  )-y.(x  )) 

I -  I  V  l  KII  1  Ill  / 


*(d(x  ,X.  )) 

in  kn 


X  cx. 

m  kn 


"'c,1 


[k  r  n-n. (x  )  \ 

t  _ 

A  (A" 

y  ill  Gl^n')-y».))P(“l.>r<ll(».-k.»  = 


lR<xm' 


:<  cx, 
rn  kn 


lP(*kn) 


'Jsin>'.  equation  (A. 3)  we  obtain 


X  cv 
m  Kn 


K  n-n.(x  )  n 

^  i 

iL  Gi<v>)p<^) 


K  r  n-n.(x  )  \  k  /  \ 

Um>  i"l  Gilxkn)-yi<xm')  ’  i"l  Gi<1‘ltn))' 


x  cx, 
m  kn 


Lemma  A. 4.  Let  x^n  e  Wg,  k  e  B.  Then, 


k  r  n-n .  ( x  )  \  .  k  /■  \ 

\  R  m  l  aR(xm)  "  (  y .  ( x,  )-ym(x  )  )  =  T¥[  A  (y.(x.  )) 

Ac&,  I  1  x  :  x  cv  ,  i=l  vi  kn  Jx  m  J  'I  1  =  1  vi  kn  / 


x  <rV, 
m  A 


(A- 6  ) 


hr\jo£:  If  |b|  =1,  then  the  Lemma  is  true  by  Lemma  A .  3 , 

A  c  B,  k  t  A  =>  a  =  3  and  x  c  x,  *->  x  r  M  ~  . 

—  m  kn  m  A  B 


5urr>cse  IB  >  1. 


Let  i'  ►*  B,  r  i  k.  Now,  o,k  ■  B  and  x.  <;  Wn 

kn  B 


->  yrukn)  =  yk<*k„>  - 

l<i<k  J 

Since  y^x^)  =  y^55]^*  an^  recalling  that  the  pairing  of  the  Pj-^ 
with  the  IK(l<i,j<k)  is  completely  unknown  (and  that  the  populations 
are  tagged  in  such  a  way  that  their  ordering  is  unique),  it  can  be  seen 
that  interchanging  the  labels  r  and  k  would  lead  to  the  same  stopping 
states  contained  in  x^n  with  the  same  randomization  coefficients.  Thus 


_  1  k  f  n-n. (x  )  \ 

A:  A£B,  W  *L  ,  aRU”> 


x  cx, 
m  xn 

x  eV. 
m  A 


(A. 7) 


-i  k  r  n-n.(x  )  \ 

=  l  yrr  I  a?(x  )  II  (  1  A 

A:  AcB,  iAl  x  cv  .  *  111  i=l  W^n^i^m^ 


x  "V 
m  A 
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Hence ,  the  i.h.s.  of  (A. 6) 


Wr; 


r-rB  A:  AcB, 
ke  A 


FT 


k  r  n-n.(x  )  x 

!(Xm)  i!1Cyi(\n)-yi(xm)) 


X  cv  , 
m  Ten 

x  t'V 
m  A 


which  using  (A. 7) 


re  B  A :  AcB , 
re  A 


=  FrT  l  l  yjq*  £ 


k  /•  n-n.(x  )  X 

,(x  )  n  (  .  \  m  ) 

1  m  i=Ayi(xkn)-yi(xm)  / 


X  CY 

m  Kn 

x  eV. 
m  A 


The  above  equals 
1 


W*:L  re  A  W,L  ,  V*.'  ji  G  1<^ 


m  kn 

x  eV 
m  A 


w  1 

I  '  A:  AcB  x  C 


k  f  n-n. (x  )  \ 

l  a_(x  )  n  (  ,  \  m,  J 

R  m  .  ,\yAx.  )-y.(x  )J 
,  i=l  kn  •'i 

m  kn 


x  eV 
m  A 


1  k  f  n-n  (x  )  X 

=  TnT  ^  a_(x  )  H  (  .  v  ,  .  ) 

B  -  Km.  .1  y,  x,  )-y.(x  )  / 

1  1  x  cy  i=l\;i  nen  'i  m  ' 


x  cy 
m  kn 


by  the  note  following  Lemma  A. 2, 


k 

a 

i=l 


Gf’Sci,1 


) 


by  Lemma  A . 3 .  □ 

We  now  proceed  with  our  proof  of  Theorem  5.1. 
Proof:  From  equation  (A. 5)  we  have 


J 


'K'S  I  ( K,S:;,  )  )  rAf  /■  *  R  Xin ^ 

A  :  Ac  K ,  I  I  x.  . V  . 

~ .  m  A 

k.  A 


y  tt?  I  a_(x  )p(x  )■ 

,  _ ,,  A  L„  Km  m 


A:  Ac/C,  lAl  x  <;V 

krA  rn  A 


A  .  1  we  can  write 


-r-rr  )  i,,(x  )!  (:<  )  ) 

.  .  u  I A  k  m  m  '  n,  , 

A :  A<-  K ,  1  x  -  V ,  x.  •  L  ( x  ) 

.  m  A  kr.  m 

k-  A 


f  k  /  n-n  .  ( .x  )  \ 

‘  {  (y.U.  )-/.(:<  ) ) 

k  1  V  1  r.Il  J  ill  • 


P(d(x  ,X,  )) 

m  k  :i 


A :  A'  k  , 


A,  VA  '  B:  ^  *k„  V 


v  >  v 

k  ii  in 


k  r  n-n.(x  )  N 

P  (  ,  /  T  n.  )  P(x  )P( d (x  ,x  )) 

.  ,  V  v.  I  x,  )-v .  ( x  )  j  m  m  kn 

1-1  V  1  hU  1  Til  ' 


•oil:;*'*  (Siyiu.c  c-l  A. 2  the  l.h.3.  equal i 


A:  A cK, 

k  A 


v-V  y  <i  a  x  )  )'  l 

l*f  *  rV.  R  m  P:  X,  .  'W. 


x  rV 
m  A 


‘k  i  H 

'X 

1  n  m 


(  \  r  n-n .  ( x  )  \  ^ 

.  I  ,1  [  ’  ,n,  ,  J  P(x  )'  (d(x  ,X  ))> 

1  .  .  [  v . ( x,  )-y.(y-.  )  J  "•  kn  / 
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>:-urv  -  ■ju.it  ■'•>n  (A  .3  )  can  be  written  as 


i  TiT  aR(x,n)  'i  - 

A:  A:_K,  I  I  xr/VA  B:  A<B  x,^, 

^  X.  'X 

KIi  III 


V  /  ri-n ,  ( :<  )  \  ^ 

S-y>.  J  '<%»>) 

1  =  1  V-1 1  bn  J.  m  J 


V 


H:  !'•-  K  ■  W„  A:  Ac B ,  x  :  x  cx 


k.  B 


‘kn  B 


k.  A 


in'  "m  kn 


t— r  aD(x  ) 
j  A  j  R  m 


x  A 
Til  A 


(  k  r  vn.(x  ) 

1  (  y.ut ,  j-y.V.  )J  r(xk..] 

V  i.  =  1  'J  ;  n  i  m 


B :  il  K 

r  ■  !< 


•  «'■<*,  )  ) 

v,  r“  A:  AvB 

'a  k-  A 


A I 


x  :  x  •  x,  , 
m  m  k  u 


JR(Xii 


x  ■  VA 
m  A 


k  /•  II  -T1  .(’/.)  \ 

(y.(x,  )-y .  (x  )) 

V  i  r.t.  1  ."i  y 


i  1  y ,  ur  in^  l.emmu  3.4,  this  re-hieon  t. 


I 

i  i; 


y  !■(  x  i 

1 ■■  L  t  i 


'  *’  Xkr,'WB 


kn  '[  lt{ 


r’T Gi<>’kn') 


t 


>  rljr  l  I'o 
B :  B<  K,  I  I  x  ■  W 

V>  kn  B 


-  PI  ■  ( R„r  ,T,.r  ) ) .  r 


O . J  «>*■ 


-Uwe-A-aeoAf  ied 


[,  single-stage  procedure  of  Sobel  and  Huyett  [1957]  which  takes  exactly  n 
observations  from  each  of  the  k  populations .  We  propose  a  one-at-a-time 
adaptive  sampling  rule  (R*)  which  when  used  in  conjunction  with  a  particular 
stopping  rule  (SsV)  and  terminal  decision  rule  (T*)  achieves  the  same  prob¬ 
ability  of  a  correct  selection  as  does  the  single-stage  procedure  uniformly^  in 

p  =  (p  ,...,p  ).  Letting  N  denote  the  random  total  number  of  observations  to 
%  X  K  \ 

terminate  sampling  using  the  procedure  ( R* , S* , T* )  we  show  that  n  £  N  <  kn-1; 


security  classification  OF  thu  p  Aocrwhm  Df  towg  Block  No.  20  Continued 


for  p j-k -j  •>  0  we  have  P{N  =  kn-1  |  jd)  ->■  i  while  for  p^-j  -*■  1  we  have 
P{N  =  n  |  jd)  -*  1.  For  k  =  2  the  sampling  rule  R*  (the  conjugate  sampling  j 
rule  R-)  which  is  stationary  is  optimal  in  the  sense  that  it  minimizes  / 

E(N| (p  ,p  )}  uniformly  in  ( p  , P2 )  for  p  +  p2  >  1  (p1  +  p2  <  1)  among  all 
sampling  rules  which  use  (S!’: ,T* )  and  which  take  no  more  than  n  observations 
from  cither  population;  R*  has  additional  optimal  properties  for  k  =  2.  1  The 
procedure  (R *,S*,T*)  is  generalized  for  k  >  2  to  accommodate  such  goals  as 
"Selecting  the  s  (1  <  s  <  k-l)  "best"  Bernoulli  populations  with  regard  to 
order,"  and  is  shown  to  have  desirable  properties  for  these  goals  as  well. 

Some  conjectures  are  made  concerning  the  optimality  of  (R*,S*,T*)  for  k  >  2. 
The  performance  of  (R*,S*,T*)  is  compared  for  k  >  2  with  that  of  other 
sequential  selection  procedures  that  have  been  proposed  in  the  literature.  An 
extensive  bibliography  is  included. 
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largest  single-trial  "success"  probability  maxlp-p -.-r-r^p^p  is  treated. 

Consideration  is  restricted  to  procedures  which  take  no  more  than  n 


observations  from  any  one  of  the  k  populations.  One  such  procedure  is  the 
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